05:00
This slide deck was built in Quarto!
You can click on a table’s name to open it to the right. Take a look at the first three tabs available, which will be available for every table.
Look through the tables in the Broadstreet ADI data and the CDC Births data to understand what the data is capturing, and identify the following:
05:00
This is the pattern we’ll come back to again and again in SQL (Structured Query Language):
SELECT + FROM + WHERE;
| cabin | high_score | wins | losses |
|---|---|---|---|
| Yellow | 10 | 3 | 0 |
| Blue | 7 | 2 | 1 |
| Orange | 9 | 1 | 2 |
For example: SELECT cabin, high_score FROM camp_kickball_tournament WHERE high_score < 10
Aliases allow you to (re)name things:
SELECT
cabin,
high_score,
wins/(wins + losses) AS prop_win
FROM camp_kickball_tournament
WHERE high_score < 10;
Which is easier to read?
SELECT
cabin,
high_score,
wins/(wins + losses) AS prop_win
FROM camp_kickball_tournament
WHERE high_score < 10;
select cabin,high_score,wins/(wins + losses) as prop_win from camp_kickball_tournament where high_score < 10;
Use the “Query” button (or the one with the magnifying glass) when you’re looking within a table to avoid annoying syntax issues like having to type fully qualified table names (project.dataset.table).
Once you click “Query Table” (and then “New Tab” or whatever you prefer), you’ll see a partial query, with the cursor located so that you can add fields after SELECT.
You’ll also see warnings that your query is invalid. That’s expected!
sdoh_cdc_wonder_natality) called county_natality. * Click on the “query” button and choose “In split tab”.03:00
Let’s use the AI assistant to learn some SQL!
I asked you before to find three things, and now we’ll use them in a JOIN in SQL.
What to join?
What constitutes a join?
Which type of join / data completion
INNER JOIN (or just JOIN)
LEFT (or LEFT OUTER) JOIN
RIGHT (or RIGHT OUTER) JOIN
FULL (or FULL OUTER) JOIN
The basic syntax of a join is:
SELECT [fields we want]
FROM [left table name] [some join type] [right table name]
ON (or USING) [join criteria]
We have multiples of counties, with varying ADI scores, in our query results. That’s not great – we want each county to appear only once.
What’s going on? Dates.
Gemini gets us the start of a query, but it won’t quite work. But it does give you a nice look at aliasing!
::: notes So, I start by asking Gemini to help. This is how I phrased the question. I’d like for you to click on the Gemini pencil icon and ask a question similar to mine, or you can put it in your own words. :::
This is what Gemini gave me:
SELECT
t1.Ave_Birth_Weight_gms,
t2.area_deprivation_index_percent
FROM
`bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` AS t1
INNER JOIN `bigquery-public-data.broadstreet_adi.area_deprivation_index_by_county` AS t2 ON t1.County_of_Residence_FIPS = t2.county_fips_code
AND t1.Year = t2.year
Error!
No matching signature for operator = for argument types: DATE, INT64. Supported signature: ANY = ANY at [7:10]
DATE is a type that has year, day, and month. But if I just have a four digit year, that’s an integer (INT64).
I can’t compare an INT column that has “2019” to a DATE column that has “2019-01-01”
Include what you’re trying to do and “in BigQuery.”
In our case, EXTRACT seems likely to work.
SELECT
cdc.Ave_Birth_Weight_gms,
adi.area_deprivation_index_percent
FROM
`bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` AS cdc
INNER JOIN `bigquery-public-data.broadstreet_adi.area_deprivation_index_by_county` AS adi
ON cdc.County_of_Residence_FIPS = adi.county_fips_code AND
EXTRACT(YEAR FROM cdc.Year) = adi.year
SELECT
cdc.Ave_Birth_Weight_gms,
adi.area_deprivation_index_percent,
cdc.County_of_Residence,
adi.county_name,
cdc.Year,
adi.year
FROM
`bigquery-public-data.sdoh_cdc_wonder_natality.county_natality` AS cdc
INNER JOIN `bigquery-public-data.broadstreet_adi.area_deprivation_index_by_county` AS adi
ON cdc.County_of_Residence_FIPS = adi.county_fips_code AND
EXTRACT(YEAR FROM cdc.Year) = adi.year
One last thing – we now are matching on year, but what if we have data on 2018, 2019, and 2020 for a given county, in both tables? We’ll match that data up properly, but some counties might be over-represented in our analysis.
In your Query results, you can save your work. In our case, we want to save it as a BigQuery table, in our project.
::: notes Building up a query is often iterative and difficult. You might want to save your queries to keep working on them later, tweaking them to get the best results or asking for help from someone who knows more SQL and can explain what’s going wrong. You can copy/paste your SQL into a document somewhere, sure, but you can also save them within BigQuery. In the query window, there’s a Save button. Click “Save query” and give it a good English language description. You can write a sentence here, it doesn’t have to be pithy like a table name should be. Choose a region that’s close to you, geographically. :::
::: notes Now look back in your Explorer and expand your project name and your dataset name and the Queries. Hopefully you see something like what’s on your screen – you have some saved assets in your project now, including a table of data and a query. If you don’t see this, you might have to toggle the “Show Starred Only” button and then put a star on your project, to make sure the resources in your project show up when “starred only” is selected. :::
Joy Payton, Children’s Hospital of Philadelphia